Architecture independent short vector FFTs
نویسندگان
چکیده
This paper introduces a SIMD vectorization for FFTW—the “fastest Fourier transform in the west” by Matteo Frigo and Steven Johnson. The new method leads to an architecture independent short vector SIMD FFT vectorization that utilizes the architecture adaptivity of FFTW. It is based on special FFT kernels (up to size 64 and more) that are utilized by FFTW to compute the whole transform. This vectorization supports all features of complex transforms in FFTW (arbitrary size, dimension and stride of the data vector; in-place and out-of-place transforms) and is fully transparent to the user. It is suitable for arbitrary vector sizes of the underlying hardware.
منابع مشابه
Automatic generation of prime length FFT programs
We describe a set of programs for circular convolution and prime length FFTs that are relatively short, possess great structure, share many computational procedures, and cover a large variety of lengths. The programs make clear the structure of the algorithms and clearly enumerate independent computational branches that can be performed in parallel. Moreover, each of these independent operation...
متن کاملEfficient FFTs on IRAM
Computing Fast Fourier Transforms (FFTs) is notoriously difficult on conventional general-purpose architectures because FFTs require high memory bandwidth and strided memory accesses. Since FFTs are important in signal processing, several DSPs have hardware support for performing FFTs; moreover, some DSPs are designed solely for the purpose of computing FFTs and related transforms. In this pape...
متن کاملFast Fourier Transform BYLINE
A fast Fourier transform (FFT) is an efficient algorithm to compute the discrete Fourier transform (DFT) of an input vector. Efficient means that the FFT computes the DFT of an n-element vector in O(n logn) operations in contrast to the O(n2) operations required for computing the DFT by definition. FFTs exist for any vector length n and for real and higher-dimensional data. Parallel FFTs have b...
متن کاملMultiprocessor FFTs
Several multiprocessor FFTs are developed in this paper for both vector multiprocessors with shared memory and the hypercube. Two FFTs for vector multiprocessors are given that compute an ordered transform and have a stride of one except for a single "link" step. Since multiple FFTs provide additional options for both vectorization and distribution we show that a single FFT can be performed in ...
متن کاملAn Abstraction Layer for SIMD Extensions
This paper presents an abstraction layer for short vector SIMD ISA extensions like Intel’s SSE, AMD’s 3DNow!, Motorola’s AltiVec, and IBM’s Double Hummer. It provides unified access to short vector instructions via intermediate level building blocks. These primitives are C macros that allow, for instance, portable and highly efficient implementations of discrete linear transforms like FFTs and ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001